Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support providing multi config files #336

Merged
merged 53 commits into from
Oct 30, 2024
Merged

Support providing multi config files #336

merged 53 commits into from
Oct 30, 2024

Conversation

pearsonca
Copy link
Contributor

@pearsonca pearsonca commented Oct 9, 2024

Describe your changes.

This PR introduces support for using multiple configuration files. Briefly, commands like

flepimop COMMAND -c config.yml

will now (and going forward, more easily) support

flepimop COMMAND config1.yml config2.yml config3.yml ...

n.b. the -c flag is no longer required for specifying config files.

This PR adds the patch command:

flepimop patch config1.yml config2.yml > config_combined.yml

which will produce the monolithic version of config file(s). Also, the specific options (e.g. write-csv, run_id, ...) are now also incorporated into the config; patch will capture those options and incorporate them into the resulting config file.

This PR deprecates gempyor-simulate ..., in favor of flepimop simulate ...; gempyor-simulate ... still works, but issues a deprecation warning. The -c option is not yet deprecated by this PR, but we anticipate it will be in the future.

Documentation Updates

This PR includes new example partial YAML configurations. The gitbook documentation is updated to replace gempyor-simulate with flepimop simulation and removes the -c from those examples. There is also a new gitbook documentation section added explaining the use of multiple configuration files.

@pearsonca
Copy link
Contributor Author

@jcblemai i addressed our in person discussion issues:

  • individual component cli is pushed back out to the "implementing" class for compartments
  • i removed the deprecation warning for -c. it still issues a warning when the arguments-based approach is used, indicating that new approach overrides the -c approach

@saraloo saraloo removed the request for review from kjsato October 11, 2024 18:37
@pearsonca
Copy link
Contributor Author

Righto, the CI issues appear to be passing now.

There's a lot of faff going on with passing around config. Troubleshooting the problems here hit against that design decision several times.

Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start, left comments on the big things, but I certainly missed smaller things. The big points I want to emphasize are:

  1. It doesn't look like there were any unit tests added for any of the new functionality which would be really helpful for maintaining the new features over time,
  2. There is a general lack of documentation/type hints which would be helpful in understanding what the new functionality does and allowing others to integrate it into their work,
  3. Lots of small stylistic changes, but importantly ones that help clean up the diff would be helpful for reviewing,
  4. A formatter like black is recommended for new files, and
  5. It's not clear to me that these changes are backwards compatible, if anything the changes to examples/test_cli.py suggest that they are not, which I think would create a large hurdle for operations. Is there a way to make these changes backwards compatible? Or if they are backwards compatible can the changes to examples/test_cli.py reflect that by providing the config both ways for all of the fixtures?

flepimop/gempyor_pkg/src/gempyor/cli.py Outdated Show resolved Hide resolved
Comment on lines 4 to 7
# Guidance for extending the CLI:
# - to add a new small command to the CLI, can just add a new function with the @cli.command() decorator here (e.g. patch below)
# - to add something with lots of module logic in it, should define that in the module (e.g. compartments for a group command, or simulate for a single command)
# - ... and then import that module here to add it to the CLI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go into the wiki instead of inline in the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe - I think it's useful to have at hand when editing, though I agree we should be updating a contribution / style guide as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think putting in the wiki would be preferable, it would provide more space to give commentary and even work through a small example.

flepimop/gempyor_pkg/src/gempyor/cli.py Outdated Show resolved Hide resolved
flepimop/gempyor_pkg/src/gempyor/config_validator.py Outdated Show resolved Hide resolved
@@ -142,7 +142,7 @@ def __init__(

# SEIR modifiers
self.npi_config_seir = None
if config["seir_modifiers"].exists():
if config["seir_modifiers"].exists() and self.seir_modifiers_scenario is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the need that caused this change (as well as the similar one for outcome modifiers)? Is there a corresponding unit test to go with this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem here is passing both the config and the scenario. These are merged at the CLI boundary, but then internally, the whole config is still passed around and subset by X_modifiers_scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is definitely a get-this-working change - there needs to be a more thorough overall here, which would obviate this problem)

flepimop/gempyor_pkg/src/gempyor/shared_cli.py Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation, type hints, and __all__?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might just prune this out, since I don't use it (@jcblemai didn't want to mark -c deprecated). That said, it would be useful for deprecated options to the CLI - what do folks think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't actually used then delete it. If we need it in the future we can dig it up again, but I'd rather not maintain something that isn't used.

flepimop/gempyor_pkg/src/gempyor/simulate.py Outdated Show resolved Hide resolved
flepimop/gempyor_pkg/src/gempyor/simulate.py Outdated Show resolved Hide resolved
result = runner.invoke(simulate, ['-c', 'config_sample_2pop.yml'])
result = runner.invoke(simulate, ['config_sample_2pop.yml'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes backwards compatible? I don't think that was entirely clear to me, and this seems to indicate that it is not. If it is, can this test be parameterized such that it hits both cases to prove backwards compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are - note line 33 below. I changed these to demonstrate the new capabilities, and left the last one to demo backwards compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've also added a test demonstrating the re-route of gempyor-simulate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It demonstrates the new capabilities, but I don't think it demonstrates backwards compatibility. I think ideally these tests would be parameterized such that they're invoked via the old method and the new method for each fixture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Let's chat later today about the right testing regime here - I don't think we should need to reach all the way into execution to confirm that the resulting configs / arguments passed are identical. Might warrant a bit of re-architecture / reframing of tests generally to do that, however.

Copy link
Contributor

@TimothyWillard TimothyWillard Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should need to reach all the way into execution to confirm that the resulting configs / arguments passed are identical.

Maybe, I think I would be more swayed in that not being needed if there were other unit tests to accompany this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - but I also think it's going to entail re-orienting the click approach a bit. Right now, parse happens in the commands, rather than in the main group, then dispatched down.

I think its definitely preferrable to centralize this even more (i.e. not just in one function, but into one call of that function), but that's maybe a more radical overhaul of the function signatures.

@TimothyWillard
Copy link
Contributor

It's not clear to me that these changes are backwards compatible, if anything the changes to examples/test_cli.py suggest that they are not, which I think would create a large hurdle for operations. Is there a way to make these changes backwards compatible? Or if they are backwards compatible can the changes to examples/test_cli.py reflect that by providing the config both ways for all of the fixtures?

Actually, looking again more closely I think it is backwards compatible, but I think my confusion at first glance highlights the need to cover both cases a bit more in that particular file as well as add documentation generally.

@pearsonca
Copy link
Contributor Author

Ready for re-review

  1. It doesn't look like there were any unit tests added for any of the new functionality which would be really helpful for maintaining the new features over time,

I added items to test_cli - do you mean for the internal functions, like the option / config merger?

  1. There is a general lack of documentation/type hints which would be helpful in understanding what the new functionality does and allowing others to integrate it into their work,

Added some of this, but could use some more perspective on what else is needed.

  1. Lots of small stylistic changes, but importantly ones that help clean up the diff would be helpful for reviewing,

Did these.

  1. A formatter like black is recommended for new files, and

Applied it.

  1. It's not clear to me that these changes are backwards compatible, if anything the changes to examples/test_cli.py suggest that they are not, which I think would create a large hurdle for operations. Is there a way to make these changes backwards compatible? Or if they are backwards compatible can the changes to examples/test_cli.py reflect that by providing the config both ways for all of the fixtures?

It should be backwards compatible; I'll have a think about how best to include all the ways. Will mean re-assembling the full configs, I guess.

@pearsonca
Copy link
Contributor Author

Re still failing unit test case - not sure what the deal there is. Works locally?

@saraloo
Copy link
Contributor

saraloo commented Oct 14, 2024

Haven't gone through this in full detail, but re: the tutorials, I would lean towards leaving the current configs as are so they are standalone readable, and including an example where you patch two configs together.

@pearsonca
Copy link
Contributor Author

Haven't gone through this in full detail, but re: the tutorials, I would lean towards leaving the current configs as are so they are standalone readable, and including an example where you patch two configs together.

Reverted back to the full ymls - where do the examples live? Happy to add there.

@saraloo
Copy link
Contributor

saraloo commented Oct 14, 2024

Haven't gone through this in full detail, but re: the tutorials, I would lean towards leaving the current configs as are so they are standalone readable, and including an example where you patch two configs together.

Reverted back to the full ymls - where do the examples live? Happy to add there.
Sorry, meant in the same folder - could just add your .part versions and and in any eventual documentation note that an original config + additional part is the equivalent to one of the full configs, or something like that? Pretty straight forward, but i think there's value in keeping the tutorial/example configs that have all the pieces.

@pearsonca
Copy link
Contributor Author

Sorry, meant in the same folder - could just add your .part versions and and in any eventual documentation note that an original config + additional part is the equivalent to one of the full configs, or something like that? Pretty straight forward, but i think there's value in keeping the tutorial/example configs that have all the pieces.

Alright - so the latest is good then? Happy to modify any existing examples, but it's hard to know which are relevant / where they are.

Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The improvements look good, but I think we're still missing documentation and unit tests which are essential to maintaining this functionality over time and covering edge cases now. Here are some PRs with good test/documentation changes to use as examples:

  1. Document/Test gempyor.utils.read_df/write_df #247
  2. Document And Test gempyor.file_paths #250
  3. Update utils.py #260
  4. Document/Unit Test gempyor.parameters #277
  5. Document/Unit Test gempyor.statistics #304

We're pushing toward the Google style guide for documentation, and expanded set of examples can be found here provided by Napoleon.

And on the unit testing front I think at a minimum unit tests for cli, option_config_files, and parse_config_files need to be implemented. See pytest's getting started guide or the PRs above for examples.

result = runner.invoke(simulate, ['-c', 'config_sample_2pop.yml'])
result = runner.invoke(simulate, ['config_sample_2pop.yml'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It demonstrates the new capabilities, but I don't think it demonstrates backwards compatibility. I think ideally these tests would be parameterized such that they're invoked via the old method and the new method for each fixture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the .yml extension be added to the file name for syntax highlighting? This looks like valid YAML on its own. And same for the other .part files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'll pull the part indicator into the file name.

Comment on lines 9 to 11
"""
An internal module to share common CLI elements.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module docstrings must go at the top.

argument_config_files = click.argument(
"config_files", nargs=-1, type=click.Path(exists=True)
)
""" `click` Argument decorator to handle configuration file(s) """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python does not support docstrings for variables, can just make these comments.

Comment on lines 135 to 136
config_files: list[str],
config_filepath: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a str, is a pathlib.Path. At least that's what the corresponding click argument/option suggests, you could coerce to a string before providing to this function.

first_sim_index: int,
stoch_traj_flag: bool,
) -> None:
"""Parse the configuration file(s) and override with command line arguments"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what is the right way to avoid double work here? The click options are in my mind the right place to have the documentation, and this def is not intended to public interface. I don't want us to get into double maintaining / inconsistency between the allowed options / arguments to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

along those lines: I propose a separate issue/PR here for a bit more focused refactor here to make options and such be a bit more dynamic. Swap to a kwargs-y approach, make the options list defined earlier in the file be named and use those names in the loops.

Does that seem reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way to avoid double work here is to put the documentation with this function. That's fine that they're not intended for the public interface (we should start denoting internal with a leading underscore if we're going to take care to define an external vs internal interface at this stage), but that doesn't exclude it from requiring documentation. We still need to know how to use the function internally when building out future CLI interfaces.

I'm not sure I follow the second comment that well. Why would we move to kwargs for parse_config_files if we're providing them every time (at least from what I see in current usage) anyways, do we anticipate only providing a few of these args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if parse_config_files is kwargs-driven, then effectively the arguments can be driven by the names set in the arguments/options objects. Then whatever we do with those options in the future (and maybe that other people do with options for plugins), the parse function "just works". that clearer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you mean using **kwargs. Sure, that could be done now or in a future PR doesn't matter too much either way, but it doesn't alleviate the need for documentation. One downside of **kwargs is how to construct a valid call to this function becomes a bit more ambiguous since the valid arguments are not enumerated in the function definition itself. That could be addressed by well maintained documentation though.

@pearsonca
Copy link
Contributor Author

pearsonca commented Oct 15, 2024

Summarizing conversation @TimothyWillard and I had:

  • add to test_cli using the patch action, and confirm that it matches the non-partitioned file
  • introduce unit testing for parse_config_files, checking that the purportedly supported options can all be reflected in the config file
  • otherwise prune the test_cli back to reflect more of its prior checks
  • exploring changing the approach in shared_cli to use kwargs in parse_config_files (TBD if this makes sense, after seeing what is possible with documentation, argument checking, etc.)

@pearsonca
Copy link
Contributor Author

  • exploring changing the approach in shared_cli to use kwargs in parse_config_files (TBD if this makes sense, after seeing what is possible with documentation, argument checking, etc.)

alright @TimothyWillard - how's the rough in strike you?

@TimothyWillard TimothyWillard self-requested a review October 16, 2024 14:25
Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing unit tests for parse_config_files. I'm a bit concerned about:

  1. How the documentation for parse_config_files is formatted as well as the parameters being given. I thought the take away was to use **kwargs rather than specify exact args and then list what fields it would look for in a Notes: section or similar. The current documentation is oddly formatted and not super helpful from a developer prospective.
  2. I'm also a bit confused about the --help view of the commands, for example the simulate command's output has expanded with a lot of extra info that isn't helpful, I think coming from the click_helpstring decorator, but on the other hand the flepimop compartments subcommands help pages don't list any of the options/arguments? I think ideally we want a consistent display with the acceptable options/arguments?

examples/test_cli.py Outdated Show resolved Hide resolved
examples/test_cli.py Show resolved Hide resolved
flepimop/gempyor_pkg/src/gempyor/shared_cli.py Outdated Show resolved Hide resolved
flepimop/gempyor_pkg/src/gempyor/shared_cli.py Outdated Show resolved Hide resolved
flepimop/gempyor_pkg/src/gempyor/simulate.py Outdated Show resolved Hide resolved
flepimop/gempyor_pkg/src/gempyor/simulate.py Outdated Show resolved Hide resolved
flepimop/gempyor_pkg/src/gempyor/shared_cli.py Outdated Show resolved Hide resolved
TimothyWillard
TimothyWillard previously approved these changes Oct 21, 2024
Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments, but overall I'm excited about this PR. I think this sets the stage for a very simple and cohesive CLI interface. The unit testing also makes this maintainable over time.

examples/test_cli.py Show resolved Hide resolved
flepimop/gempyor_pkg/src/gempyor/shared_cli.py Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean we can delete examples/test_cli.py then? As far as correcting the CI goes it would just require removing the "Run gempyor-cli integration tests from examples" step from .github/workflows/gempyor-ci.yml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue yes. @jcblemai any reason to keep the original version of this around?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still have code that runs the actual examples in the example folder, but we should rename it to test_examples perhaps ?


You may provide an arbitrary number of separate configuration files to combine to create a complete configuration.

At this time, only `simulate` supports multiple configuration files. Also, the patching operation is fairly crude: configuration options override previous ones completely, though with a warning. The files provided from left to right are from lowest priority (i.e. for the first file, only options specified in no other files are used) to highest priority (i.e. for the last file, its options override any other specification).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some examples would be informative to show the limitations. This could lead to very unexpected behaviour and since we have some external users, i think it would be useful to be explicit here about what this cannot do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will amend this - how about addressing, say, new seir scenarios?

@TimothyWillard
Copy link
Contributor

Noting Joseph's comment regarding shadowing. I think this needs to be tested or some specifics about what happens should be determined first, or at least a strict warning or error here. My instinct reading this was - great I can now add modifiers to a section that is pre-existing, but is this what happens? Would this ever fail, or remove preexisting configurations? From a run-to-run perspective this would be the most useful - the ability to just add parts within chunks, or even to specify how to merge two config sections. This could be moved to its own issue at a later date if we go with just issuing a warning if two config parts have the same sections.

We're thinking about how to be clever-er here, but a lot of that is going to rely on how you want it to work - there are some standards for this kind of thing (JSON patch syntax, e.g.) but they don't feel great as an approach to ask users to write. I think this is a "next issue" item, once people have had time to experiment and recognize the rough edges, try things they think should work / would like to work, etc.

I think in the future we can add flags that dictate this behavior. Say a --merge for merging modifiers together from multiple files or --no-override to avoid overriding sections, for example. I agree that it's best to determine this based on feedback, just providing an example of how to incorporate feedback in the future in a backwards compatible way.

@pearsonca pearsonca merged commit 1f92081 into dev Oct 30, 2024
2 checks passed
@jcblemai jcblemai deleted the multi-config branch October 30, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants